Search CORE

133 research outputs found

A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification

Author: Ma Shuming
Sun Xu
Publication venue
Publication date: 06/10/2017
Field of study

Text summarization and text simplification are two major ways to simplify the text for poor readers, including children, non-native speakers, and the functionally illiterate. Text summarization is to produce a brief summary of the main ideas of the text, while text simplification aims to reduce the linguistic complexity of the text and retain the original meaning. Recently, most approaches for text summarization and text simplification are based on the sequence-to-sequence model, which achieves much success in many text generation tasks. However, although the generated simplified texts are similar to source texts literally, they have low semantic relevance. In this work, our goal is to improve semantic relevance between source texts and simplified texts for text summarization and text simplification. We introduce a Semantic Relevance Based neural model to encourage high semantic similarity between texts and summaries. In our model, the source text is represented by a gated attention encoder, while the summary representation is produced by a decoder. Besides, the similarity score between the representations is maximized during training. Our experiments show that the proposed model outperforms the state-of-the-art systems on two benchmark corpus

arXiv.org e-Print Archive

Lock-Free Parallel Perceptron for Graph-based Dependency Parsing

Author: Ma Shuming
Sun Xu
Publication venue
Publication date: 02/03/2017
Field of study

Dependency parsing is an important NLP task. A popular approach for dependency parsing is structured perceptron. Still, graph-based dependency parsing has the time complexity of

O(n^3)

, and it suffers from slow training. To deal with this problem, we propose a parallel algorithm called parallel perceptron. The parallel algorithm can make full use of a multi-core computer which saves a lot of training time. Based on experiments we observe that dependency parsing with parallel perceptron can achieve 8-fold faster training speed than traditional structured perceptron methods when using 10 threads, and with no loss at all in accuracy

arXiv.org e-Print Archive

A Generic Online Parallel Learning Framework for Large Margin Models

Author: Ma Shuming
Sun Xu
Publication venue
Publication date: 02/03/2017
Field of study

To speed up the training process, many existing systems use parallel technology for online learning algorithms. However, most research mainly focus on stochastic gradient descent (SGD) instead of other algorithms. We propose a generic online parallel learning framework for large margin models, and also analyze our framework on popular large margin algorithms, including MIRA and Structured Perceptron. Our framework is lock-free and easy to implement on existing systems. Experiments show that systems with our framework can gain near linear speed up by increasing running threads, and with no loss in accuracy

arXiv.org e-Print Archive

Decoding-History-Based Adaptive Control of Attention for Neural Machine Translation

Author: Lin Junyang
Ma Shuming
Su Qi
Sun Xu
Publication venue
Publication date: 06/02/2018
Field of study

Attention-based sequence-to-sequence model has proved successful in Neural Machine Translation (NMT). However, the attention without consideration of decoding history, which includes the past information in the decoder and the attention mechanism, often causes much repetition. To address this problem, we propose the decoding-history-based Adaptive Control of Attention (ACA) for the NMT model. ACA learns to control the attention by keeping track of the decoding history and the current information with a memory vector, so that the model can take the translated contents and the current information into consideration. Experiments on Chinese-English translation and the English-Vietnamese translation have demonstrated that our model significantly outperforms the strong baselines. The analysis shows that our model is capable of generating translation with less repetition and higher accuracy. The code will be available at https://github.com/lancopk

arXiv.org e-Print Archive

Unsupervised Machine Commenting with Neural Variational Topic Model

Author: Cui Lei
Ma Shuming
Sun Xu
Wei Furu
Publication venue
Publication date: 13/09/2018
Field of study

Article comments can provide supplementary opinions and facts for readers, thereby increase the attraction and engagement of articles. Therefore, automatically commenting is helpful in improving the activeness of the community, such as online forums and news websites. Previous work shows that training an automatic commenting system requires large parallel corpora. Although part of articles are naturally paired with the comments on some websites, most articles and comments are unpaired on the Internet. To fully exploit the unpaired data, we completely remove the need for parallel data and propose a novel unsupervised approach to train an automatic article commenting model, relying on nothing but unpaired articles and comments. Our model is based on a retrieval-based commenting framework, which uses news to retrieve comments based on the similarity of their topics. The topic representation is obtained from a neural variational topic model, which is trained in an unsupervised manner. We evaluate our model on a news comment dataset. Experiments show that our proposed topic-based approach significantly outperforms previous lexicon-based models. The model also profits from paired corpora and achieves state-of-the-art performance under semi-supervised scenarios

arXiv.org e-Print Archive

meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

Author: Ma Shuming
Ren Xuancheng
Sun Xu
Wang Houfeng
Publication venue
Publication date: 10/03/2019
Field of study

We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-

k

elements (in terms of magnitude) are kept. As a result, only

k

rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction (

k

divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1-4% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The code is available at https://github.com/lancopku/mePropComment: Accepted by the 34th International Conference on Machine Learning (ICML 2017

arXiv.org e-Print Archive

Bag-of-Words as Target for Neural Machine Translation

Author: Lin Junyang
Ma Shuming
Sun Xu
Wang Yizhong
Publication venue
Publication date: 13/05/2018
Field of study

A sentence can be translated into more than one correct sentences. However, most of the existing neural machine translation models only use one of the correct translations as the targets, and the other correct sentences are punished as the incorrect sentences in the training stage. Since most of the correct translations for one sentence share the similar bag-of-words, it is possible to distinguish the correct translations from the incorrect ones by the bag-of-words. In this paper, we propose an approach that uses both the sentences and the bag-of-words as targets in the training stage, in order to encourage the model to generate the potentially correct sentences that are not appeared in the training set. We evaluate our model on a Chinese-English translation dataset, and experiments show our model outperforms the strong baselines by the BLEU score of 4.55.Comment: accepted by ACL 201

arXiv.org e-Print Archive

Automatic Academic Paper Rating Based on Modularized Hierarchical Convolutional Neural Network

Author: Li Wei
Ma Shuming
Sun Xu
Yang Pengcheng
Publication venue
Publication date: 10/05/2018
Field of study

As more and more academic papers are being submitted to conferences and journals, evaluating all these papers by professionals is time-consuming and can cause inequality due to the personal factors of the reviewers. In this paper, in order to assist professionals in evaluating academic papers, we propose a novel task: automatic academic paper rating (AAPR), which automatically determine whether to accept academic papers. We build a new dataset for this task and propose a novel modularized hierarchical convolutional neural network to achieve automatic academic paper rating. Evaluation results show that the proposed model outperforms the baselines by a large margin. The dataset and code are available at \url{https://github.com/lancopku/AAPR}Comment: Accepted by ACL201

arXiv.org e-Print Archive

Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization

Author: Lin Junyang
Ma Shuming
Sun Xu
Wang Houfeng
Publication venue
Publication date: 13/05/2018
Field of study

Most of the current abstractive text summarization models are based on the sequence-to-sequence model (Seq2Seq). The source content of social media is long and noisy, so it is difficult for Seq2Seq to learn an accurate semantic representation. Compared with the source content, the annotated summary is short and well written. Moreover, it shares the same meaning as the source content. In this work, we supervise the learning of the representation of the source content with that of the summary. In implementation, we regard a summary autoencoder as an assistant supervisor of Seq2Seq. Following previous work, we evaluate our model on a popular Chinese social media dataset. Experimental results show that our model achieves the state-of-the-art performances on the benchmark dataset.Comment: accepted by ACL 201

arXiv.org e-Print Archive

Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data?

Author: Ma Shuming
Ren Xuancheng
Sun Xu
Yang Yang
Zhang Yi
Publication venue
Publication date: 12/06/2018
Field of study

Existing neural models usually predict the tag of the current token independent of the neighboring tags. The popular LSTM-CRF model considers the tag dependencies between every two consecutive tags. However, it is hard for existing neural models to take longer distance dependencies of tags into consideration. The scalability is mainly limited by the complex model structures and the cost of dynamic programming during training. In our work, we first design a new model called "high order LSTM" to predict multiple tags for the current token which contains not only the current tag but also the previous several tags. We call the number of tags in one prediction as "order". Then we propose a new method called Multi-Order BiLSTM (MO-BiLSTM) which combines low order and high order LSTMs together. MO-BiLSTM keeps the scalability to high order models with a pruning technique. We evaluate MO-BiLSTM on all-phrase chunking and NER datasets. Experiment results show that MO-BiLSTM achieves the state-of-the-art result in chunking and highly competitive results in two NER datasets.Comment: Accepted by COLING 201

arXiv.org e-Print Archive